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Abstract 

We describe the winning entry to the Amazon Pick¬ 
ing Challenge 2015. From the experience of build¬ 
ing this system and competing, we derive sev¬ 
eral conclusions: 1) We suggest to characterize 
robotic system building along four key aspects, 
each of them spanning a spectrum of solutions— 
modularity vs. integration, generality vs. assump¬ 
tions, computation vs. embodiment, and planning 
vs. feedback. 2) To understand which region of 
each spectrum most adequately addresses which 
robotic problem, we must explore the full spec¬ 
trum of possible approaches. 3) For manipulation 
problems in unstructured environments, certain re¬ 
gions of each spectrum match the problem most 
adequately, and should be exploited further. This 
is supported by the fact that our solution deviated 
from the majority of the other challenge entries 
along each of the spectra. This is an abridged ver¬ 
sion of [Eppner et al., 2016], 

1 Introduction 

The Amazon Picking Challenge 2015 (APC) tested the ability 
of robotic systems to fulfill a fictitious order by autonomously 
picking the ordered items from a warehouse shelf (Fig. 1). 
The system presented here outperformed the 25 other entries, 
winning by a significant margin. In this paper, we provide a 
technical description and experimental evaluation of our sys¬ 
tem. Our system-building experience led to the following in¬ 
sights: Robotic systems can be characterized along four key 
aspects. To develop a shared understanding of system build¬ 
ing we should explore these spectra. For manipulation in un¬ 
structured environments, we believe that certain regions of 
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Figure 1: Our robot picks a plush toy during the challenge. 

each spectrum match the problem characteristics most ade¬ 
quately and should be examined by roboticists with increased 
emphasis. Those four key aspects are: 

A. Modularity vs. Integration: In robotics, success is de¬ 
termined by the behavior of the entire system, not by individ¬ 
ual modules [Brooks, 1990]. Still, a high degree of modular¬ 
ity allows breaking down problems into simpler subproblems. 
Wrong modularization, however, can make solving problems 
unnecessarily difficult. Until we fully understand which mod¬ 
ularization is most adequate for manipulation in unstructured 
environments, we suggest to build tightly integrated systems 
and constantly revise their modularization. 

B. Computation vs. Embodiment: Robot behavior results 
from the interplay of computation (software) and embodi¬ 
ment (hardware). Computation is a powerful and versatile 
tool but adapting the embodiment sometimes leads to simple 
and robust solutions. We suggest that in manipulation, one 
should consider alternative embodiments as part of the solu¬ 
tion process. 

C. Planning vs. Feedback: Planning performs search in a 
world model, leading to verifiable solutions. Feedback from 
physical interactions, on the other hand, reduces uncertainty 
and allows to find local solutions without expensive compu¬ 
tation. We suggest to use planning only when necessary and 
explore the use of feedback as an alternative when the manip¬ 
ulation task does not require global search. 

D. Generality vs. Assumptions: For robotics research, find¬ 
ing general solutions is highly desirable. However, solving 
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the most general problem might be unnecessary or even un¬ 
feasible. We suggest to search for reasonable and useful as¬ 
sumptions that aid solving manipulation problems in unstruc¬ 
tured environments. By extracting, sharing, and revising as¬ 
sumptions that prove useful for an increasingly broader vari¬ 
ation of a problem, we will naturally progress towards a gen¬ 
eral solution. 

These aspects are not novel and certainly will not surprise 
the robotic practitioner. However, what should come as a 
surprise is the sparsity with which the corresponding spec¬ 
tra have been explored by our community and how rarely 
these aspects are used explicitly to characterize robotic sys¬ 
tems. Case in point: Our solution to the APC explores dif¬ 
ferent regions on these spectra than most other challenge en¬ 
tries [Correll et al ., 2016]. We believe that these differences 
were crucial for our success. 

We propose that by making the four key aspects of robotic 
systems (and possibly additional ones that we did not iden¬ 
tify yet) explicit, our community will begin to understand 
the mapping of problem characteristics to the appropriate 
regions on these spectra. Our paper is, of course, only 
a single data point in this endeavor. But if our commu¬ 
nity starts characterizing robotic systems along the proposed 
axes, thereby making design choices transparent and com¬ 
parable, we might move towards a scientific theory of sys¬ 
tem building [Atkeson el al., 2015; Hawes et al., 2010; 
Katz and Brock, 2011]. 

2 The Amazon Picking Challenge 2015 

The APC consists of autonomously picking twelve out of 
25 objects from a warehouse shelf and placing them into 
a storage container (Fig. 1) within 20 minutes. The robot 
knows which objects are contained in each of the shelf’s 
twelve bins, but not of their exact arrangement. For each suc¬ 
cessfully picked target object, the robot receives 10, 15, or 
20 points, depending on how many additional objects were 
in the same bin. The 25 objects varied widely in size and 
appearance. 

3 Technical System Description 

We now describe the hardware and algorithmic components 
of our solution. We will mention connections between our 
design choices and the four key aspects. 

Hardware Components We use a 7-DoF Barrett WAM 
mounted on a Nomadic XR4000 mobile base. The inclusion 
of holonomic mobility—a choice of embodiment that set our 
solution apart from most other entries in the APC—greatly 
facilitated the generation of motion. The ability to reposition 
the base enabled the arm to easily reach inside all of the bins. 
Our end-effector consists of a modified crevice nozzle with 
a suction cup mounted at its tip. An off-the-shelf vacuum 
cleaner generates sufficient air flow to lift up to 1.5 kg. It 
can reliably pick up all challenge objects except for the pen¬ 
cil cup. Grasping success is rather insensitive to the exact 
contact location with the object, leading to reduced require¬ 
ments for perception. At the same time, the end-effector’s 
thin shape reduces the need for complex collision avoidance. 


as it easily fits in between objects, pushing them aside if nec¬ 
essary. This simple choice for the end-effector illustrates that 
an appropriate embodiment simplifies different aspects of the 
overall solution, including grasp planning and perception. 
Motion Generation Objects are picked using two pre¬ 
defined grasp strategies: a top-down grasp, and a grasp ap¬ 
proaching the object from the side. Both primitives deliber¬ 
ately move the end-effector into the object and push it against 
the floor, walls, or other objects. This is an example of ex¬ 
ploiting the environment to guide manipulation [Eppner et 
al, 2015] using haptic feedback. The execution of picking 
motions is realized with continuous feedback controllers. We 
transition between these controllers based on discrete sen¬ 
sor events. This behavior can be described by hybrid au¬ 
tomata [Egerstedt, 2000]. The automaton also contains coun¬ 
termeasures for common failures. The resulting hybrid au¬ 
tomaton consists of 26 states and 50 transitions, of which 34 
deal with error handling, e.g. if the robot detects an undesired 
contact with the shelf, it retracts from the bin and reattempts 
to pick the object later. 

Object Recognition Based on data from an RGB-D cam¬ 
era three steps are performed: feature extraction, object seg¬ 
mentation, and bounding box fitting. The first step extracts a 
number of task-specific features for every pixel of the RGB-D 
image. These features include information about color, dis¬ 
tance to the tracked shelf model and height within the bin. 
Instead of searching for features that could solve the general 
object recognition problem, these task-specific features rely 
on strong assumptions (e.g. that objects are placed in a known 
shelf). Statistics about these pixel-features for each object are 
derived from manually segmented training images and enable 
the second step to find the image segment that has the highest 
probability of belonging to the target object. This step ex¬ 
ploits additional assumptions to facilitate segmentation, e.g. 
that only a small subset of all objects is present in every bin 
and that the robot knows which objects these are. The third 
step takes the point cloud for this segment and fits a bounding 
box. This allows the robot to decide where and from which 
direction it should perform the pick. Our object recognition 
pipeline is described in [Jonschkowski et al., 2016]. 

4 Evaluation 

The APC provides an in-depth evaluation of our system, com¬ 
paring its performance with 25 teams from around the world. 
We complement these results with nine additional experi¬ 
ments, using object configurations from the competition. 
Quantitative Evaluation During the competition we 
scored 148 out of 190 points (2 nd : 88, 3 rd : 35). We attempted 
to pick all twelve objects and were successful for ten. An av¬ 
erage picking motion took 87 seconds. This allowed us to 
maximally attempt 14 picks within the 20 minutes of chal¬ 
lenge duration. 

We reenacted all five shelf configurations that were used 
in the challenge for further testing. We performed two trials 
per setup, using the robot system from the challenge without 
modifications. During 200 minutes the robot picked 95 ob¬ 
jects, of which 85 were target objects. On average we col¬ 
lected 117.6 points (cr = 29.2) which is 62.5% of all available 
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85 

successful picks 

13 

object recognition failures 

9 

bulky objects stuck at removal 

9 

small objects (end effector imprecision) 

2 

displacing objects during approach 

2 

meshed pencil cup (suction fails) 


Table 1: Failure cases for 120 picking attempts 

ones. This shows that the competition run was on the upper 
end of the system’s capabilities. Still only one (72 points) out 
of ten trials would have lead to our team placing second. 

System Limitations From the 120 attempted picks, the sys¬ 
tem picked ten wrong objects and failed to pick 25 objects. 
We think that all failure cases (Table 1) can be addressed by 
shifting along the spectra of the proposed aspects. 

Object Recognition Failures: We attribute 13 failed pick¬ 
ing attempts to the object recognition pipeline. These failures 
occur when our local features cannot discriminate between 
the objects present in the target bin, resulting in wrong ob¬ 
ject boundaries or mistaking another object for the target ob¬ 
ject. We believe that object recognition can be improved most 
effectively by shifting along spectrum A towards tighter in¬ 
tegration, and along spectrum C towards more feedback. We 
could reject poses that result in physically implausible config¬ 
urations by tighter integration of segmentation and geometric 
pose reasoning. Moreover, we could decrease the likelihood 
of picking wrong objects by weighing objects or visually in¬ 
specting them after the pick. 

Bulky Objects Stuck at Removal: Eight scenarios contained 
a large box which could only be removed from the bin by tilt¬ 
ing it. Our system failed on all attempts. The long bottle 
brush also got stuck once on the shelf lip and dropped. To 
address these failures, we need to shift along spectrum C to¬ 
wards (motion) planning. Planning would allow us to reason 
how to reorient objects to remove them from the bin. 

Small Objects: Out of ten attempts, the robot failed nine 
times attempting to pick up the small spark plugs. In the 
competition run, the robot even picked up a non-target object 
instead. These failures result from the fact that the reaching 
movement is executed open-loop, accumulating a significant 
error. This can be addressed by shifting along spectrum C 
towards more feedback, e.g. by using visual servoing. 

Displacing Objects: In five out of ten attempts, the robot 
toppled over the glue bottle. The bottle then required a reat¬ 
tempt from the top. In two cases the robot did not have 
enough time for a reattempt and lost points. As before, this 
failure case can be alleviated by additional feedback; tum¬ 
bling could be detected earlier and lead to a different strategy. 

Pencil Cup: The meshed metal pencil cup does not have 
enough solid surface to pick it with suction. This failure mode 
shows a limitation of our chosen embodiment (Sec. 5.2). It 
suggests possible extensions to our end-effector, e.g. adding 
a mechanical or magnetic gripper. 

5 Key Aspects of Building Robotic Systems 

We will now generalize our experience from the APC to 
building robotic systems in general. For each aspect we 


present arguments and examples for both ends of the spec¬ 
trum and position our system on it. 

5.1 Modularity vs. Integration 

There is a continuum between tightly integrated and modular 
solutions which has been investigated in systems engineering, 
computer science and product management. 

Modularity Building systems of arbitrary complexity 
without structuring them into modules is difficult. Modu¬ 
larity decomposes complexity by breaking down a problem 
into smaller sub-problems that can be solved and tested in¬ 
dividually. Because of the power of compositionality, build¬ 
ing modular systems is the prevalent paradigm in robotics. 
This is reflected in the separation of robotics into the classi¬ 
cal fields of perception, planning, control, etc. as well as in 
the produced software. For example, high modularity is one 
of the core concepts of ROS [Quigley et al, 2009], a popular 
framework for implementing robotic systems. Similarly, li¬ 
braries like OpenCV, PCL, and Move It! represent commonly 
employed modules for computer vision and planning. 

Integration Robotic systems generate behavior as a re¬ 
sult of integrating many software and hardware compo¬ 
nents [Brooks, 1990; Cohen, 1996]. The usefulness of a 
robotic system is determined by the performance of the in¬ 
tegrated system, rather than by the performance of individual 
components. To ensure that the performance of the entire sys¬ 
tem is maximized, and to avoid making wrong commitments 
or addressing sub-problems that are unnecessarily difficult, 
all components of the system should be chosen to maximally 
exploit potential synergies between components. To identify 
these synergies in the absence of established system-building 
guidelines requires early integration [Johnson et al., 2015; 
Katz and Brock, 2011]. Important advances were achieved by 
overcoming existing “modularizations”, e.g. by combining 
interaction and perception [Bohg et al., 2016; Martin-Martin 
and Brock, 2014], 

Our Design Choice on the Spectrum Our system used 
ROS [Quigley et al., 2009] and relied on various standard 
modules, e.g. for visual processing and navigation. However, 
we embraced tight integration at various levels. We integrated 
planning and control using hybrid automata, adapted our 
picking strategies to the embodiment and the requirements 
for object recognition to the picking strategies. Furthermore, 
we adapted many ideas from agile development [Schwaber, 
2004]: rapid prototyping, early and continuous integration, 
adversarial testing, and shared knowledge. 

5.2 Computation vs. Embodiment 

The idea that mechanisms and materials expose behavior that 
is normally attributed to computation is known as morpho¬ 
logical computation [Pfeifer and Gomez, 2009]. 

Computation Since computation is more flexible and can 
be altered easily, compared to the embodiment (hardware), it 
allows building highly complex systems with diverse behav¬ 
iors. Purely computational approaches to robotics [Coleman, 
2015; La Valle, 2006] also have the advantage of potentially 
being hardware-agnostic. Many examples of computation- 
focused approaches exist in robotics, e.g. posing grasping as 
a contact point planning problem [Miller and Allen, 2004], It 
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is appealing as it abstracts the problem away from the hand 
and the environmental context. Similarly, perception has been 
traditionally seen as a passive, purely computational prob¬ 
lem [Marr, 1982], 

Embodiment Tailoring the hardware to a particular prob¬ 
lem can reduce the required computation. Hardware solu¬ 
tions are often simple and robust, especially when uncer¬ 
tainty is present. Grasp planning can benefit substantially 
from embodiment, as exemplified by simple under-actuated 
robotic hands [Deimel and Brock, 2016; Dollar and Howe, 
2010], Appropriate embodiment also facilitates perception, 
e.g. adaptive hands reduce the need for accuracy in object 
pose estimation. Moreover, placing a vision sensor on the 
robot arm increases the sensor’s field of view and reduces the 
effect of occlusions [Aloimonos et al., 1988], 

Our Design Choice on the Spectrum We reduced the need 
for computation by using a suction cup. The reduced number 
of degrees of freedom simplified grasp planning and object 
pose estimation. We also reduced the need for computation 
by increasing the number of degrees of freedom by mounting 
the robot arm on a mobile base. This allowed us to generate 
motion mostly through feedback control, rather than resorting 
to motion planning. However, in the APC we failed to pick 
the pencil cup due to our chosen embodiment. 

5.3 Planning vs. Feedback 

Classical robotics and AI employed the sense-plan-act 
paradigm, assuming the robot can build a perfect model of 
the world. The difficulty of obtaining such models initiated a 
shift towards feedback-driven approaches [Brooks, 1990]. 

Planning Planning finds global solutions, where con¬ 
trollers based on local feedback would fail. The most com¬ 
mon application in robotics is motion planning [LaValle, 
2006], Practitioners need to provide models of the envi¬ 
ronment, calibrate the robot, and localize it in the environ¬ 
ment [Thrun et al., 2005], Under these prerequisites, motion 
planners serve as general and versatile black-box solvers. 

Feedback If global search is not required or not possi¬ 
ble, feedback control based on task-relevant features is of¬ 
ten sufficient to generate successful robot motion. Feedback 
can be exploited in the visual [Espiau et al., 1992] or contact 
domain [Lozano-Perez et al., 1984; Eppner et al., 2015] to 
simplify manipulation tasks. Feedback approaches are par¬ 
ticularly useful in the presence of uncertainty, high dimen¬ 
sionality, long time horizons, and inaccurate models. In these 
cases, planning would be computationally demanding and of¬ 
ten intractable [Papadimitriou and Tsitsiklis, 1987]. 

Our Design Choice on the Spectrum Our system relies on 
very simple planning. We use on-line grasp approach plan¬ 
ning and execute the motions using pre-defined, feedback- 
guided motion primitives, avoiding configuration-space mo¬ 
tion planning altogether. This positions our solution far to the 
feedback-side of the spectrum, in contrast to the majority of 
other challenge entries (80% of the teams used motion plan¬ 
ning, 44% used Movelt! [Suean and Chitta, 2016], [Correll et 
al., 2016]). Feedback control is so successful in the APC set¬ 
ting because the task only requires a limited range of motions 
and the shelf provides plenty of contact surfaces. However, 
some shortcomings of our system, such as the lack of in-bin 


reorientation of objects, should be addressed by some form of 
planning. 

5.4 Generality vs. Assumptions 

This spectrum is reflected in the no free lunch theo¬ 
rem [Wolpert, 1996], the bias-variance trade-off [Hastie et 
al., 2005], and the balance between generality and specificity 
in system design [van Gigch, 1991]. 

Generality Finding general solutions is an important goal 
in robotics. A general solution applies to a wide range of 
problems and reflects a deep understanding of the problem 
at hand. In contrast, solutions strongly tailored to a spe¬ 
cific problem instance (e.g. a robot demo) might not lead to 
insights or contribute to a broader understanding of system 
building. A number of general approaches were successfully 
applied in robotics, e.g. task-generic planning algorithms 
such as A* are widely used for mobile robot navigation. Re¬ 
cursive Bayesian estimation is a very generic framework that 
helped solving many different problems in robotics. 

Assumptions In machine learning, search, and optimiza¬ 
tion, the no free lunch theorems prove that no problem can 
be solved without making appropriate assumptions [Wolpert, 
1996; Wolpert and Macready, 1997]: averaged over all possi¬ 
ble problems, there is no method that outperforms random 
guessing. The only way to improve on random guessing 
is by making assumptions about the problem. We believe 
that problems in robotics are characterized by a significant 
amount of reoccurring underlying structure. E.g. in motion 
planning, adding information about workspace connectivity 
can reduce the computational complexity by up to three or¬ 
ders of magnitude [Rickert et al., 2014], In reinforcement 
learning, adding explicit knowledge about physics makes 
the learning problem tractable by reducing its dimensional¬ 
ity [Jonschkowski and Brock, 2015; Scholz et al, 2014], 

Our Design Choice on the Spectrum Our APC system 
used available (general) solutions whenever they proved suf¬ 
ficient to solve the problem. Since we could not find exist¬ 
ing approaches for reliably locating the target object, we used 
various assumptions to simplify the problem, e.g. that the ob¬ 
jects are placed in a known shelf or that the contents of each 
shelf bin are known and therefore only a small number of ob¬ 
jects need to be considered. Our general solutions included a 
particle filter for localizing the robot, and standard joint and 
operational space controllers for motion generation. 

6 Conclusion 

We presented and evaluated our winning system for the 
2015 APC. To describe the system, we proposed four key as¬ 
pects of system building. A systematic description of robotic 
systems according to these aspects (and additional ones pro¬ 
posed by others in the future) will facilitate accumulating 
general knowledge for building robotic systems. 

Our lessons on building robotic systems are consistent with 
those others derived from their experience in similar robotics 
challenges. In the area of autonomous driving [Buehler et 
al, 2007] and humanoid robotics [Atkeson et al., 2015], the 
resulting insights have led to significant advances. We hope 
that the APC and our lessons learned will be equally useful 
for manipulation. 
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